Outlier Cluster Formation in Spectral Clustering

نویسندگان

  • Takuro Ina
  • Atsushi Hashimoto
  • Masaaki Iiyama
  • Hidekazu Kasahara
  • Mikihiko Mori
  • Michihiko Minoh
چکیده

Outlier detection and cluster number estimation is an important issue for clustering real data. This paper focuses on spectral clustering, a timetested clustering method, and reveals its important properties related to outliers. The highlights of this paper are the following two mathematical observations: first, spectral clustering’s intrinsic property of an outlier cluster formation, and second, the singularity of an outlier cluster with a valid cluster number. Based on these observations, we designed a function that evaluates clustering and outlier detection results. In experiments, we prepared two scenarios, face clustering in photo album and person re-identification in a camera network. We confirmed that the proposed method detects outliers and estimates the number of clusters properly in both problems. Our method outperforms state-of-the-art methods in both the 128-dimensional sparse space for face clustering and the 4,096-dimensional nonsparse space for person re-identification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Spectral Clustering Based Outlier Detection Technique

Outlier detection shows its increasingly high practical value in many application areas such as intrusion detection, fraud detection, discovery of criminal activities in electronic commerce and so on. Many techniques have been developed for outlier detection, including distribution-based outlier detection algorithm, depth-based outlier detection algorithm, distance-based outlier detection algor...

متن کامل

Ranking Overlap and Outlier Points in Data using Soft Kernel Spectral Clustering

Soft clustering algorithms can handle real-life datasets better as they capture the presence of inherent overlapping clusters. A soft kernel spectral clustering (SKSC) method proposed in [1] exploited the eigen-projections of the points to assign them different cluster membership probabilities. In this paper, we detect points in dense overlapping regions as overlap points. We also identify the ...

متن کامل

Approximate Document Outlier Detection Using Random Spectral Projection

Outlier detection is an important process for text document collections, but as the collection grows, the detection process becomes a computationally expensive task. Random projection has shown to provide a good fast approximation of sparse data, such as document vectors, for outlier detection. The random samples of Fourier and cosine spectrum have shown to provide good approximations of sparse...

متن کامل

Applying Constrained Clustering for Active Exploration of Music Collections

In this paper we investigate the capabilities of constrained clustering in application to active exploration of music collections. Constrained clustering has been developed to improve clustering methods through pairwise constraints. Although these constraints are received as queries from a noiseless oracle, most of the methods involve a random procedure stage to decide which elements are presen...

متن کامل

Outlier Detection Using Enhanced K-means Clustering Algorithm and Weight Based Center Approach

ABSTRACT-In Data mining there are lots of methods are used to detect the outlier by making the clusters of data and then detect the outlier from them. In general Clustering method plays a very important role in data mining. Clustering means grouping the similar data objects together based on the characteristic they possess. Outlier Detection is an important issue in Data mining; particularly it...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1703.01028  شماره 

صفحات  -

تاریخ انتشار 2017